Database reference guide |
HOME |
Loading DataOne of the main features of Engine is the speed with which it can load data (speeds of up to 11 Mb/sec or 42 Gb/hour have been reported). This section gives an overview of how data is loaded, stored and indexed into the system. Column Based Data LoadEngine stores its data in columns (hence the term CBAT - Column Based Analytic Storage). It also loads and indexes its data on a column by column basis. This is different from a standard relational database which loads data in rows. At the lowest level, using Column based storage impacts the mechanism used to load data into the system. Data is loaded via the script based iLoader product. This processes data one column at a time. Over-IndexingBy loading data column by column, Engine is able to efficiently implement a system known as over-indexing. This is where many indexes are created for each column. The indexes that are created depend on the use to which the column will be put, and not all are created at load time. Some, for example, those used in crosstabulations, may only be created when they are needed. Over-Indexing is a key feature of Engine. By allowing multiple indexes to be created against the same column, it is possible for Engine to select the most suitable index for whichever calculation it is executing. This provides optimum performance for many different types of calculation, for example, Queries, Data Engineering, Crosstabulation, Segmentation. Storage RequirementsIndexes require storage space, so data loaded into Engine has a larger footprint than the raw data. As a very general rule of thumb, an additional 30% storage is required on top of that required by the raw data. For more details on Storage Requirements, see Limits and Constraints - Data Types. Engine also requires additional space when executing the load process. This must be taken into account when specifying a system, as running out of disk space during a load can lead to serious problems. Unbalanced DataBecause data is loaded into Engine on a column by column basis, it is possible for an unequal number of rows to be loaded into each column. When this situation occurs, the data is said to be unbalanced. Unbalanced data may be temporarily necessary as part of the load process - however, it can cause subtle and serious problems if the system is left in this state. Most analytical operations do not check for unbalanced data - however, if they are run against tables with an unequal number of rows in each column, the results that they provide may be subtly or even spectacularly incorrect. |
Online & Instructor-Led Courses | Training Videos | Webinar Recordings | ![]() |
|
![]() |
© Alterian. All Rights Reserved. | Privacy Policy | Legal Notice | ![]() ![]() ![]() |